Effective use of pause information in language modelling for speech recognition

نویسندگان

Kengo Ohta

Masatoshi Tsuchiya

Seiichi Nakagawa

چکیده

This paper addresses mismatch between speech processing units used by a speech recognizer and sentences of corpora. A standard speech recognizer divides an input speech into speech processing units based on its power information. On the other hand, training corpora of language models are divided into sentences based on punctuations. There is inevitable mismatch between speech processing units and sentences, and both of them are not optimal for a spontaneous speech recognition task. This paper presents two sub issues to address this problem. At first, the words of the preceding units are utilized to predict the words of the succeeding units, in order to address the mismatch between speech processing units and optimal units. Secondly, we propose a method to build a language model including short pause from a corpus with no short pause to address the mismatch between speech processing units and sentences. Their combination achieved a 4.5% relative improvement over the conventional method in the meeting speech recognition task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pause Transfer in the Speech-to-Speech Translation Domain

In the speech-to-speech translation framework automatic speech recognition and spoken language translation components provide additional information about the location of pauses in the source language. This information may be useful to improve the performance of pause prediction algorithms for speech synthesis. In this paper we propose a transfer algorithm based on tuples. The results show a be...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Developing a Standardized Medical Speech Recognition Database for Reconstructive Hand Surgery

Fast and holistic access to the patients’ clinical record is a major requirement of modern medical decision support systems (DSS). While electronic health records (EHRs) have replaced the traditional paper-based records in most healthcare organization, the data entry into these systems remains largely manual. Speech recognition technology promises substitution of the more convenient speech-base...

متن کامل

Automatic Utterance Segmentation in Spontaneous Speech

As applications incorporating speech recognition technology become widely used, it is desireable to have such systems interact naturally with its users. For such natural interaction to occur, recognition systems must be able to accurately detect when a speaker has finished speaking. This research presents an analysis combining lower and higher level cues to perform the utterance endpointing tas...

متن کامل

A Comparison between Three Methods of Language Sampling: Freeplay, Narrative Speech and Conversation

Objectives: The spontaneous language sample analysis is an important part of the language assessment protocol. Language samples give us useful information about how children use language in the natural situations of daily life. The purpose of this study was to compare Conversation, Freeplay, and narrative speech in aspects of Mean Length of Utterance (MLU), Type-token ratio (TTR), and the numbe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Effective use of pause information in language modelling for speech recognition

نویسندگان

چکیده

منابع مشابه

Pause Transfer in the Speech-to-Speech Translation Domain

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Developing a Standardized Medical Speech Recognition Database for Reconstructive Hand Surgery

Automatic Utterance Segmentation in Spontaneous Speech

A Comparison between Three Methods of Language Sampling: Freeplay, Narrative Speech and Conversation

عنوان ژورنال:

اشتراک گذاری